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Abstract 

Astronomical data is often uncertain with errors that are heteroscedastic (different for each data point) 
and covariant between different dimensions. Assuming that a set of D-dimensional data points can be 
described by a (D — l)-dimensional plane with intrinsic scatter, we derive the general likelihood function 
to be maximised to recover the best fitting model. Alongside the mathematical description, we also release 
the HYPER-FIT package for the R statistical language (github.com/asgr/hyper.fit) and a user-friendly web 
interface for online fitting (hyperfit.icrar.org). The hyper-fit package offers access to a large number of 
fitting routines, includes visualisation tools, and is fully documented in an extensive user manual. Most of 
the HYPER-FIT functionality is accessible via the web interface. In this paper we include applications to 
toy examples and to real astronomical data from the literature: the mass-size, Tully-Fisher, Fundamental 
Plane, and mass-spin-morphology relations. In most cases the hyper-fit solutions are in good agreement 
with published values, but uncover more information regarding the fitted model. 

Keywords: statistics - fitting 


1 Introduction 

There is a common conception that the method used 
for fitting multidimensional data is a matter of subjec¬ 
tive choice. However, if the problem is mathematically 
well-defined, the optimal fitting solution is, in many 
cases, unique. This paper presents the optimal maxi¬ 
mum likelihood fitting solution (with the caveats that 
a maximum likelihood fitting approach entails) for the 
frequently encountered case schematised in Figure 1, 
where D-dimensional data (i) have multivariate Gaus¬ 
sian uncertainties (these may be different for each data 
point), and (ii) randomly sample a (D — l)-dimensional 
plane (a line if D = 2, a plane if D = 3, a hyperplane 
if D > 4) with intrinsic Gaussian scatter. In particu¬ 
lar, the data have no defined predictor/response vari¬ 
able. Hence standard regression analysis does not apply, 
and in general it would yield different geometric solu¬ 
tions depending on axis-ordering (though see von Tou- 
ssaint 2015, which presents the necessary prior modifi¬ 
cations to make traditional intrinsic-scatter-free regres¬ 
sion analysis rotationally invariant). 

Within the assumptions above, the (D — 1)- 
dimensional plane that best explains the observed 
data is unique and can be fit using a traditional 
likelihood method. In this paper we present the general 
D-dimensional form of the likelihood function and 



Figure 1. Schematic representation of the linear model (blue) 
fitted to the data points (red). Both the model and the data 
are assumed to have Gaussian distributions, representing intrinsic 
model scatter and statistical measurement uncertainties, respec¬ 
tively. In both cases, Icr-contours are shown as dashed lines. 


release a package for the R statistical programming 
language (hyper-fit) that optimally fits data using 
this likelihood approach. We also release a user-friendly 
web interface to run hyper-fit online and apply our 
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algorithm to common relations in galaxy population 
studies. 

Other authors have investigated the problem of re¬ 
gression and correlation analysis (Kelly 2007; Hogg 
et ah 2010; Kelly 2011; von Toussaint 2015) with the 
aim of improving on the ad hoc use of standard linear or 
multi-variate regression, reduced analysis, computa¬ 
tionally expensive (and usually unnecessary) bootstrap 
resampling, and the ‘bisector method’ that is sometimes 
wrongly assumed to be invariant with respect to an 
arbitrary rotation of the data. Of recent work that is 
familiar in the astronomical community, Kelly (2007) 
presents a comprehensive approach for linear regres¬ 
sion analysis with support for non-detections, and made 
available IDL code that can analyse multi-dimensional 
data with a single predictor variable. More recently 
Hogg et al. (2010) describes the approach to attack¬ 
ing such joint regression problems in two dimensions. 
Taylor et al. (2015) describes an approach to extend 
the work of Hogg et al. (2010) to multiple Gaussian 
mixture models to model the red and blue populations 
in the Galaxy And Mass Assembly survey (Liske et al. 
2015). Here we extend Hogg et al. (2010) to arbitrary 
dimensions and include a suite of tools to aid scientists 
with such analysis. 

In Section 2 we describe the likelihood method for an 
arbitrary number of dimensions and for different pro¬ 
jection systems (to allow for maximum flexibility in its 
application). In Section 3 we describe and publicly re¬ 
lease the HYPER-FIT package for R, a user friendly and 
fully documented package that can be applied to such 
problems. We also introduce a simple web interface for 
the package. In Section 4 we show simple applications of 
HYPER-FIT to toy examples, including the dataset pre¬ 
sented in Hogg et al. (2010). In Section 5 we show more 
advanced examples taken directly from published as¬ 
tronomy papers, where we can compare our fitting pro¬ 
cedure against the relationships presented in the origi¬ 
nal work. In the appendix we add a discussion regrading 
the subtle biases contained in an extraction of model in¬ 
trinsic scatter, and provide a route to properly account 
for a non-uniformly sampled data. We also illustrate 
the implementation of the 2D-fitting problem in hierar¬ 
chical Bayesian inference software, with an illustrative 
model using Just Another Gibbs Sampler (JAGS^). 


2 Mathematical derivation 

This section develops the general theory for fitting data 
points in D dimensions by a (D —l)-dimensional plane, 
such as fitting points in two dimensions by a line (Fig¬ 
ure 1). As described in Section 1 we assume that the 
data represent a random sample of a population exactly 
described by a (I^ — l)-dimensional plane with intrinsic 


^ mcmc-jags.sourceforge.net / 


Gaussian scatter, referred to as the ‘generative model’. 
The objective is to determine the most likely genera¬ 
tive model, by fitting simultaneously for the (D — l)- 
dimensional plane and the intrinsic Gaussian scatter, 
given data points with multivariate Gaussian uncer¬ 
tainties: the uncertainties are (1) different for each data 
point (heteroscedasticity), (2) independent between dif¬ 
ferent data points, and (3) covariant between orthog¬ 
onal directions, e.g. x-errors and ^-errors can be cor¬ 
related. We first present the general method in D di¬ 
mensions in coordinate-free form (§2.1) and in Garte- 
sian coordinates (§2.2). Explicit equations for the two- 
dimensional case are given in §2.3. Subtleties and gen¬ 
eralisations are presented in the appendix. 


2.1 Coordinate-free solution in D dimensions 

Gonsider N data points i = in D dimensions. 

These points are specified by the measured positions 
G relative to a fixed origin O G and Gaussian 
measurement uncertainties expressed by the symmetric 
covariance matrices C^. We can then fully express our 
knowledge about a point i as the probability density 
function (PDF) p(x|xi), describing the probability that 
point i is truly at the position x given the measured 
position x^. In the case where the intrinsic distribution 
of points in any direction is uniform (the case of non- 
uniform point distributions is discussed in Appendix B), 
this PDF is symmetric, i.e. p(x|xi) = p(xi|x), and reads 


p(xi|x) = 


V(27r)^|Ci| 




( 1 ) 


In Figure 1, these PDFs are shown as red shading. 

The data points are assumed to be randomly sampled 
from a PDF, called the generative model, defined by a 
(D —l)-dimensional plane l-L C (a line if D = 2, a 
plane if D = 3, a hyperplane if D > 4) with Gaussian 
scatter a±_ orthogonal to R. Note that the scatter re¬ 
mains Gaussian along any other direction, not parallel 
to R. The blue shading in Figure 1 represents this gen¬ 
erative model, with the solid line indicating the central 
line R and the dashed lines being the parallel lines at a 
distance a± from R. The plane R is fully specified by 
the normal vector n _LR going from O to R^ and the 
distance of a point x G from R equals |h^x — n|, 
where n = |n| and h = n/n. Thus, R is the ensemble of 
points X that satisfy 


h~^x — n = 0 . 


( 2 ) 


Given this parametrisation, the generative model is 
fully described by the PDF 


/>m(x) 
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(3) 
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expressing the probability of generating a point at po¬ 
sition X. Note that we here assume that the generative 
model is infinitely uniform along any dimension in the 
plane. If the selection function was, e.g., a power law dis¬ 
tribution, then p(x|xi) / p(xi|x), and Equation 3 can¬ 
not be a simply stated. Such scenarios are discussed in 
Appendix B. 

The likelihood of measuring data point i at position 
Xi then equals 

1 

Ci= dxp(xi|x)p^(x) = — e , 

w 27r5^ ■ 

( 4 ) 

where = h~^C^h. Following the Bayesian the¬ 

orem and assuming uniform priors on h and cr_L, the 
most likely model maximises the total likelihood C = 
^^4 hence its logarithm ln£. Up to an addi¬ 
tive constant ^ ln(27r). 


1 ^ 


i=l 




(n'''xi - nf- 


( 5 ) 


Note that Eq. (5) is manifestly rotationally invariant in 
this coordinate-free notation, as expected when fitting 
data without a preferred choice of predictor / response 
variables. For the purpose of numerical optimisation, 
free parameters in the form of unit vectors are rather 
impractical, because of their requirement to have a fixed 
norm. It is therefore helpful to express Eq. (5) entirely 
in terms of n, without using h and n. 


1 ^ 

ln£=-lW 
2 ^ 

i=l 

In summary, assuming that the data samples a lin¬ 
ear generative model described by Eq. (3), the model- 
parameters that best describe the data are found by 
maximising the likelihood function given in Eq. (5) or 
Eq. (6). 


In C7 I + 


n' GoW 


nTn 


+ 


(n'[x,-n])^ 


rinTn+nTC 


(6) 


2.2 Notations in Cartesian coordinates 

For plotting and further manipulation it is often use¬ 
ful to work in a Cartesian coordinate system with D 
orthogonal coordinates j = 1,..., D. The typical expres¬ 
sion of a linear model in such coordinates is 

xd = T:f~i^ajXj + /?, ( 7 ) 

where Xj is the j-th coordinate of x = (xi, ...,xd)~^, 
aj = dxDldxj is the slope along the coordinate j, 
and P is the zero-point along the coordinate D. 
Eq. (7) can be rewritten as a^x + ^ = 0, where = 
(ofi,..., Ofiv-i, —1) is a normal vector of the (D — l)- 
dimensional plane. The Gaussian scatter about this 
plane is now parametrised with the scatter ajj along 
the coordinate D. A vector calculation shows that the 


transformation between the coordinate-free model pa¬ 
rameters {n, and the coordinate-dependent param¬ 
eters takes the form 

n = —Bol{ol^ol)~^ 

> T ,- 1/2 ( 8 ) 

ol) ' 

or, inversely, 

Oi = —nlriD 

P = {\Xn)/nD (9) 

(Xd = (T_L(n"^n)^/^/|nr,|. 

In practice, one can always fit for {n, by max¬ 
imising Eq. (6) and then convert {n, (t±_} to {a, /3, cfd}^ 
if desired. Alternatively, we can rewrite Eq. (6) directly 
in terms of the parameters {ol^P^ctd} as 


1 ^ 


i=l 


In 


oJ OL 


+ OL^CiOL 


(g^Xj F PY ~ 

+ CX^CiCX 


( 10 ) 

however, this formulation of the likelihood is prone 
to optimisation problems since the parameters diverge 
when H becomes parallel to the coordinate D. 


2.3 Two-dimensional case 


Let us now consider the special case of N points in 
D = 2 dimensions (Figure 1), adopting the standard 
'xy^ notation, where x replaces xi and y replaces X 2 = 
Xd- The data points are specified by the positions 
x^ = {xi^yi)^ and heteroscedastic Gaussian errors (j^^i 
and (jy^i with correlation coefficients q; thus the covari¬ 
ance matrix elements are {Gi)xx = iCi)yy = (Xli, 
{^i)xy = (^xy,i = Ci(^x,i(^y,i- Thcsc poiuts are to be fitted 
by a line 

y = ax + P, (11) 


which we can also parameterise by the normal vector 
n = n(cos 0, sin0)~^ going from the origin to the line. 
The parameters o, /d, and (jy {ay = an being the intrin¬ 
sic scatter along y) are then given by (following Eqs. 9) 

“1 n a± 

(X=- -T, 'Xy= . (12) 

tan0 sm0 |sm0| 

which manifestly diverge as the line becomes vertical 
(0 G ttZ). In these notations, Eqs. (5) and (10) simplify 
to 


1 

2 



i=l 

N r 


E 




+1 

In —- 


{xi cos (pFyi sin 0 —n)^ 


{axi -yi -h/3)^ 


(13) 


where 5^ • = + a^ - cos^0 -h a^ - sin^0 + 

‘^crxy,i cos 0 sin 0 and Sy - = + ^y,i ~ ‘^^xy,i<^ 
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(reminder: ay is the intrinsic vertical scatter of the 
model, whereas ay^i is the vertical uncertainty of point 
i). The most likely fit to the data is obtained by 
maximising Eq. (13), either through adjusting the 
parameters (0,n,cr^) or (a^P^ay). Following Ap¬ 
pendix A, unbiased estimators for the population-model 
(as opposed to the sample-model) of the scatter a± 
or ay and its variance a‘j_ or ay can be obtained via 
Eqs. (A2) and (Al), respectively. 

It is worth stressing that the exact Eq. (13) never 
reduces to the frequently used, simple minimisation of 
= ^iiVi — OiXi — not even if the data has zero 
uncertainties and if the intrinsic scatter is fixed. This 
reflects the fact that the y^-minimisation breaks the ro¬ 
tational invariance by assuming a uniform prior for the 
slope a (rather than the angle 0). The term \n{a^ + 1) 
in Eq. (13) is needed to recover the rotational invari¬ 
ance. 

3 Numerical implementation 

Having established the above mathematical framework 
for calculating the likelihood for a multi-dimensional 
fit, we developed a fully documented and tested R 
package called hyper-fit. This is available through a 
github repository^ and can be installed from within R 
with the following commands 

> install.packages("magicaxis”, 
dependencies=TRUE) 

> install .packages(''MASS'', dependencies=TRUE) 

> install.packages("rgl", dependencies=TRUE) 

> install.packages("devtools") 

> library(devtools) 

> install_github ("asgr/LaplacesDemon") 

> install_github (" asgr/hyper. fit") 

To load the package into your R session run 

> library(hyper.fit) 

The core function that utilises the maths presented 
in the previous Section is hyper. like, which calculates 
the likelihood (Eq. (6)) for a given set of hyperplane pa¬ 
rameters and a given dataset. This dataset can be error- 
free, or include covariant and heteroscedastic errors. To 
fit a hyperplane the hyper. like function is accessed by 
hyper.fit, a utility fitting function that attempts to 
optimally fit a generative model to the data. The pack¬ 
age includes a number of user-friendly summary and 
plotting outputs in order to visualise the fitted model 
and investigate its quality. The full 35 page manual de¬ 
tailing the HYPER-FIT package can be viewed at hyper- 
fit, icrar.org, and is included within the help files of the 


^ https: //github.com/asgr/hyper.fit 


package itself (e.g. see > ?hyper .fit within R once the 
package is loaded.). 

3.1 hyper.like 

The basic likelihood function hyper. like requires the 
user to specify a vector of model parameters parm, an 
N X D dimensional position matrix X (where N is the 
number of data points and D the dimensionality of the 
dataset, i.e. 10 rows by 2 columns in the case of 10 
data points with x and y positions) and di D x D x N 
array covarray containing the covariance error matrix 
for each point stacked along the last index. A simple 
example looks like 

> hyper.like(parm, X, covarray) 

parm must be specified as the normal vector n that 
points from the origin to the hyperplane concatenated 
with the intrinsic scatter a± orthogonal to the hyper¬ 
plane. E.g. if the user wants to calculate the likelihood 
of a plane in 3D with equation z = 2x ^ 3y ^ 1 and in¬ 
trinsic scatter along the z axis of 4, this becomes a nor¬ 
mal vector with elements [-0.143,-0.214,0.071] with 
intrinsic scatter orthogonal to the hyperplane equal 
to 1.07. In this case parm=c(-0.143, -0.214, 0.071, 
1.07), i.e. as expected, a 3D plane with intrinsic scat¬ 
ter is fully described by 4 (= D + 1) parameters. Be¬ 
cause of the large number of coordinate systems that 
unambiguously define a hyperplane (and many of them 
might be useful in different circumstances) the hyper- 
fit package also includes the function hyper. convert 
that allows the user to convert between coordinate sys¬ 
tems simply. In astronomy the Euclidian z = a[l]x ^ 
a[2]y -\- P system is probably the most common (e.g. 
Fundamental Plane definition Faber 1987; Binney & 
Merrifield 1998), but this has limitations when it comes 
to efficient high dimensional hyperplane fitting, as we 
discuss later. 

3.2 hyper.fit 

In practice, we expect most user interaction with 
the HYPER-FIT package will be via the higher level 
hyper.fit function (hence the name of the package). 
For example, to find an optimal fit for the data in 
the N X D dimensional matrix X (see §3.1) with no 
specified error, it suffices to run 

> fit=hyper.fit(X) 

This function interacts with the hyper. like func¬ 
tion and attempts to find the best generative hy¬ 
perplane model via a number of schemes. The user 
has access to three main high level fitting routines: 
the base R optim function (available with all in- 
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stallations), the LaplaceApproximation function and 
the LaplacesDemon function (both available in the 
LAPLACESDEMON package). 

3.2.1 optim 

optim is the base R parameter optimisation routine. 
It allows the user to generically find the maximum or 
minimum of a target function via a number of built- 
in schemes that are popular in the statistical opti¬ 
misation community. The default option is to use a 
Nelder-Mead (Nelder & Mead 1965) optimiser that uses 
only function values. This is robust but relatively slow, 
with the advantage that it will work reasonably well 
for non-differentiable functions, i.e. it does not com¬ 
pute the local hessian at every step. Conjugate gradi¬ 
ent (Hestenes & Stiefel 1952); quasi-Newtonian BFGS 
(Broyden, Fletcher, Goldfarb and Shanno; Fletcher 
1970); and simulated annealing (Belisle 1992) are some 
of the other popular methods available to the user. All 
of these schemes are fully documented within the optim 
manual pages that is included with any standard R im¬ 
plementation. 

3.2.2 LaplaceApproximation 

To offer the user more flexibility we also allow 
access to the non-base open-source MIT-licensed 
LAPLACESDEMON package that users can install from 
github.com/asgr/LaplacesDemon (the direct R instal¬ 
lation code is provided above). It is developed by Sta- 
tisticat (LLG) and currently contains 18 different opti¬ 
misation schemes. Many of these are shared with optim 
(e.g. the default option of Nelder-Mead; conjugate gra¬ 
dient descent), but many are unique to the package (e.g. 
Levenberg-Marquardt, Marquardt 1963; particle swarm 
optimisation, Kennedy & Eberhart 1995). 

The LaplaceApproximation function offers a host of 
high level user tools, and with these object oriented ex¬ 
tensions it tends to be slower to converge compared to 
the same method in optim. The large range of methods 
means the user should be able to find a robustly con¬ 
verged solution in most situations, without having to 
resort to much more computationally expensive MGMG 
methods accessed through the LaplacesDemon function 
which is also available in the LAPLACESDEMON package. 

The LaplaceApproximation function and the wider 
LAPLACESDEMON package is extensively documented, 
both in the bundled manual and at the www.bayesian- 
inference.com website ^ maintained by the package de¬ 
velopers. 

3.2.3 LaplacesDemon 

The final hyperplane fitting function that users of 
HYPER-FIT have access to is LaplacesDemon which is 

^https://web.archive.org/web/20141224051720/ 
http: / / www.bayesian-inference.com / index 


also included in the LAPLACESDEMON package. The 
function includes a suite of 41 Markov-Ghain Monte- 
Garlo (MGMG) routines, and the number is still grow¬ 
ing. For HYPER-FIT, the default routine is Griddy-Gibbs 
(Ritter & Tanner 1992), a component-wise algorithm 
that approximates the conditional distribution by eval¬ 
uating the likelihood model at a set of points on a grid. 
This proved to have attractive qualities and well be¬ 
haved systematics for relatively few effective samples 
during testing on toy models. The user also has ac¬ 
cess to a large variety of routine families, including 
Gibbs; Metropolis-Hasting (Hastings 1970); slice sam¬ 
pling (Neal 2003); and Hamiltonian Monte Garlo (Du¬ 
ane et al. 1987). 

In practice the user should investigate the range of 
options available, and the optimal routine for a given 
problem might not simply be the default provided. 
To aid the user in choosing an appropriate scheme, 
the LAPLACESDEMON package includes a comprehensive 
suite of analysis tools to analyse the post convergence 
Markov-Ghains. The potential advantages to using an 
MGMG approach to analyse the generative likelihood 
model are manifold, allowing the user of hyper-fit 
to investigate complex high-dimensional multi-modal 
problems that might not be well-suited to the tradi¬ 
tional downhill gradient approaches included in optim 
and LaplaceApproximation. Again, the function is ex¬ 
tensively described both in the LAPLACESDEMON pack¬ 
age and on the www.bayesian-inference.com website 

3.3 plot 

The HYPER-FIT package comes with a class sensitive 
plotting function where the user is able to execute a 
command such as 

> plot(hyper.fit(X)) 

where X is a 2 x N or 3 x N matrix. The plot 
displays the data with the best fit. If (uncorrelated 
or correlated) Gaussian errors are given, e.g. via 
plot (hyper .fit (XjCovarray)), they are displayed 
as 2D ellipses or 3D ellipsoids, respectively. The 
built-in plotting only works in 2 or 3 dimensions (i.e. 
x,y or x,y,z data), which will likely be the two most 
common hyperplane fitting situations encountered in 
astronomical applications. 

3.4 summary 

The HYPER-FIT package comes with a class sensitive 
summary function where the user is able to execute a 


^https://web.archive.org/web/20141224051720/ 
http: / / www.bayesian-inference.com / index 
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command such as 

> summary (hyper. fit (X) ) 

where X can be any {D x A/^)-array with D >2 
dimensions. Using summary for 2D data will produce 
an output similar to 

> summary(fitnoerror) 

Call used was: 

hyper.fit(X = cbind(xval, yval)) 

Data supplied was 5rows x 2cols. 

Requested parameters: 
alphal beta.vert scat.vert 
0.4680861 0.6272718 0.2656171 

Errors for parameters: 

err_alphal err_beta.vert err_scat.vert 

0.12634307 0.11998805 0.08497509 

The full parameter covariance matrix: 
alphal beta.vert scat.vert 

alphal 0.015962572 -0.0021390417 0.0016270601 
beta.vert -0.002139042 0.0143971319 -0.0002182796 
scat.vert 0.001627060 -0.0002182796 0.0072207660 

The sum of the log-likelihood for the best fit 
parameters: 

LL 

4.623924 

Unbiased population estimator for the intrinsic 
scatter: 
scat.vert 
0.3721954 

Standardised parameters vertically projected along 
dimension 2 (yval): 
alphal beta.vert scat.vert 
0.4680861 0.6272718 0.2656171 

Standardised generative hyperplane equation with 
unbiased population estimator for the intrinsic 
scatter, vertically projected along dimension 2 
(yval): 

yval ~ N(mu= 0.4681*xval + 0.6273, sigma= 0.3722) 


3.5 Web interface 

To aid the accessibility of the tools we have developed in 
this paper, a high level web interface to the hyper-fit 
package has been developed that runs on a permanent 
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Analyse Data Methods Info Hyper Rt Paper Hyper Rt Manual 

IM Recalculate Using example 

Fit Options yv 
Plot Options ^ 

Upload Data v 

Choose file to upload 
chooss File No file Chosen 

Column Names 
g Use Show Example 



Example Data v 


g hogg S GAMAsmVsize 

S TFR s MJB 


Example File 


Make your Own Fit 


"x" "y” "sx” "sy" "corxy’* 
-1.22 -0.15 0.12 0.13 0.90 
-0.78 0.49 0.14 0.03 -0.40 
0.44 1.17 0.20 0.07 -0.25 
1.01 0.72 0.07 0.11 0.00 
1.22 1.22 0.06 0.08 -0.20 


Choose data from Example Data or upload your own 
data under Upload Data. 


Figure 2. The front page view of the hyper-fit web tool avail¬ 
able at hyperfit.icrar.org. The web tool allows users to interact 
with HYPER-FIT through a simple GUI interface, with nearly all 
HYPER-FIT functionality available to them. The code itself runs re¬ 
motely on a machine located at ICRAR, and the user’s computer 
is only used to render the web site graphics. 


virtual server using R Shiny^. The hyper-fit compu¬ 
tations are executed remotely, and the web interface 
itself is in standard JavaScript and accessible using any 
modern popular web browser. This website interface to 
HYPER-FIT can be accessed at hyperfit.icrar.org. The 
HYPER-FIT website also hosts the latest version of the 
manual, and the most recent version of the associated 
HYPER-FIT paper. The design philosophy for the page 
is heavily influenced by modern web design, where the 
emphasis is on clean aesthetics, simplicity and intuitive¬ 
ness. The front page view of the web interface is shown 
in Figure 2. 

The HYPER-FIT web tool allows access to nearly all 
of the functionality of hyper-fit, with a few key re¬ 
strictions: 

• Only 2D or 3D analysis available (R hyper-fit 
allows for arbitrary A^D analysis) 

• Uploaded files are limited to 2,000 or fewer row 
entries (R hyper-fit allows for hard-disk limited 
size datasets) 

• When using the LaplaceApproximation or 
LaplacesDemon algorithms only 20,000 iterations 
are allowed (R hyper-fit has no specific limit on 
the number of iterations) 

• A number of the more complex LaplacesDemon 
methods are unavailable to the user because the 
specification options are too complex to describe 
via a web interface (R hyper-fit allows access to 
all LaplacesDemon methods) 


^http://shiny.rstudio.com/ 










Hyper-Fit 


7 



xval 

Figure 3. 2D fit with no errors. Figure shows the default plot 
output of the hyper.plot2d function (accessed via the class spe¬ 
cific plot method) included as part of the R hyper-fit pack¬ 
age, where the best generative model for the data is shown as 
a solid line with the intrinsic scatter indicated by dashed lines. 
The colouring of the points shows the ‘tension’ with respect to 
the best-fit linear model and the measurement errors (zero in this 
case), where redder colours indicate data that is less likely to be 
explainable. 

In practice, for the vast majority of typical use cases 
these restrictions will not limit the utility of the analysis 
offered by the web tool. For the extreme combination 
of 2,000 rows of data and 20,000 iterations of one of the 
LaplacesDemon MCMC samplers users might need to 
wait a couple of minutes for results. 

4 Mock examples 

In this Section we give some examples using two sets of 
idealised mock data. 

4.1 Simple Examples 

To help novice users get familiar with R and the 
HYPER-FIT package we start this Section by building 
the basic example shown in Figure 1. We first create 
the central x and y positions of the 5 data points, 
encoded in the 5-element vectors x_val and y-val, 

> xval = c(-1.22, -0.78, 0.44, 1.01, 1.22) 

> yval = c(-0.15, 0.49, 1.17, 0.72, 1.22) 

We can can obtain the hyper-fit solution and plot 
the result (seen in Figure 3) with 

> fitnoerror=hyper.fit(cbind(xval, yval)) 

> plot (f itnoerror) (Figure 3) 

Since this example has no errors associated with 
the data positions the generative model created has 
a large intrinsic scatter that broadly encompasses 



in 

d 

I 


- 2-10 1 2 
xval 

Figure 4. 2D fit with uncorrelated (between x and y) errors. 
The errors are represented by 2D ellipses at the location of the xy- 
data. See Figure 3 for further details on this Figure. The reduction 
in the intrinsic scatter required to explain the data compared to 
Figure 3 is noticeable (see the dashed line intersects on the y- 
axis). 


the data points (dashed lines in Figure 3). Extending 
the analysis we can add x-errors x_err (standard 
deviations of the errors along the x-dimension) and 
^-errors y_err and recalculate the fit and plot (Figure 
4) with 

> xerr = c(0.12, 0.14, 0.20, 0.07, 0.06) 

> yerr = c(0.13, 0.03, 0.07, 0.11, 0.08) 

> fitwitherror=hyper.fit(cbind(xval, yval), 
vars=cbind(xerr, yerr)''2) 

> plot (f itwitherror) (Figured) 

Since we have added errors to the data, the intrinsic 
scatter required in the generative model is reduced. 
To extend this 2D example to the full complexity 
of Figure 1 we assume that we know xy_cor (the 
correlation between the errors for each individual data 
point) and recalculate the fit and plot (Figure 5) with 

> xycor = c(0.90, -0.40, -0.25, 0.00, -0.20) 

> fitwitherrorandcor=hyper.fit(cbind(xval, 
yval), covarray=makecovarray2d(xerr, yerr, 
xycor)) 

> plot (f itwitherrorandcor) (Figure 5) 

The effect of including the correlations can be quite 
subtle, but inspecting the intersections on the ^-axis it 
is clear that both the best-fitting line and the intrinsic 
scatter have slightly changed. To see how dramatic the 
effect of large errors can be on the estimated intrinsic 
scatter we can simply inflate our errors by a large 
factor. We recalculate the fit and plot (Figure 6) with 
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xval 

Figure 5. 2D fit with correlated (between x and y) errors. The 
errors are represented by 2D ellipses at the location of the xy- 
data. See Figure 3 for further details on this Figure. The keen 
reader might notice that this Figure uses data from the Figure 1 
schematic. 



xval 

Figure 6. 2D fit with correlated (between x and y) errors. The 
errors here are a factor 1.9 times larger than used in 5. See Figure 
3 for further details on this Figure. 


> fitwithbigerrorandcor=hyper.fit(cbind(xval, 
yval), covarray=makecovarray2d(xerr*l.9, 
yerr*1.9, xycor)) 

> plot(fitwithbigerrorandcor) (Figure 6) 

Per-data-point error covariance is often overlooked 
during regression analysis because of the lack of readily 
available tools to correctly make use of the informa¬ 
tion, but the impact of correctly using them can be 
non-negligible. As a final simple example we take the 
input for Figure 6 and simply rotate the covariance 
matrix by 90 degrees. We recalculate the fit and plot 
(Figure 7) with 

> fitwithbigerrorandrotcor=hyper.fit(cbind(xval, 
yval), covarray=makecovarray2d(yerr*l.9, 
xerr*1.9, -xycor)) 



- 2-10 1 2 
xval 

Figure 7. 2D fit with correlated (between x and y) errors. The 
errors here are a factor 1.9 times larger than used in 5 and the cor¬ 
relation matrix is rotated by 90 degrees. See Figure 3 for further 
details on this Figure. 

> plot(fitwithbigerrorandrotcor) (Figure 7) 

The impact of simply rotating the covariance matrix 
is to flatten the preferred slope and to slightly reduce 
the intrinsic scatter. 

4.2 Hogg Example 

In the arXiv paper by Hogg et al. (2010) the general 
approach to the problem of generative fitting is ex¬ 
plored, along with exercises for the user. The majority 
of the paper is concerned with 2D examples, and an 
idealised data set (Table 1 in the paper) is included 
to allow readers to recreate their fits and plots. This 
dataset comes bundled with the hyper-fit package, 
making it easy to recreate many of the examples of 
Hogg et al. (2010). Towards the end of their paper they 
include an exercise on fitting 2D data with covariant 
errors. We can generate the hyper-fit solution to this 
exercise and create the relevant plots (Figure 8) with 

> data(hogg) 

> hoggcovarray= makecovarray2d(hogg [-3,''x_err"] , 
hogg[-3,"y_err"] , hogg[-3,"corxy"]) 

> plot(hyper.fit(hogg[-3,c("x", "y")], 

covarray=hoggcovarray) ) (Figure 8) 

The fit we generate agrees very closely with the ex¬ 
ample in Hogg et al. (2010), which is not surprising 
since the likelihood function we maximise is identical, 
bar a normalising factor (i.e. an additive constant in 
ln£). The differences seen between Hogg’s Figure 14 
and the bottom panel of our Figure 8, showing the 
projected posterior chain density distribution of the in¬ 
trinsic scatter parameter, are likely to be due to our 
specific choice of MCMC solver. In our hyper-fit ex- 
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Intrinsic Scatter 


Figure 8. 2D toy data with correlated (between x and y) er¬ 
rors taken from Hogg et al. (2010). The top panel shows the fit 
to the Hogg et al. (2010) data minus row 3, as per exercise 17 
and Figure 13 of Hogg et al. (2010), the bottom panel shows the 
MCMC posterior chains for the intrinsic scatter, as per Figure 14 
of Hogg et al. (2010). The vertical dashed lines specify the 95^^ 
and 99^^ percentile range of the posterior chains, as requested for 
the original exercise. See Figure 3 for further details on the top 
two panels of this Figure. 


ample we use Componentwise Hit-And-Run Metropo¬ 
lis (CHARM, Turchin 1971), being a particularly sim¬ 
ple MCMC for a novice user to specify, and one that 
performs well as long as the number of data points is 
greater than 10. 


5 Astrophysical Examples 

In this Section, we have endeavoured to include a mix¬ 
ture of real examples from published astronomy papers. 
In some cases aspects of the data are strictly proprietary 
(in the sense that the tables used to generate paper 
Figures have not been explicitly published), but the re¬ 
spective authors have allowed us to include the required 
data in our package. 



Figure 9. GAMA mass-size relation data taken from Lange et al 
in prep. See Figure 4 for further details on this Figure. 


5.1 2D Mass-Size Relation 

The galaxy mass-size relation is a topic of great 
popularity in the current astronomical literature (Shen 
et al. 2003; Trujillo et al. 2004; Lange et al. 2015). 

The HYPER-FIT package includes data taken from 
the bottom-right panel of Figure 3 from Lange et al. 
(2015). To do our fit on these data and create Figure 
9 we make use of the published data errors and the 
^/ymax weights calculated for each galaxy by running 

> data(GAMAsmVsize) 

> plot(hyper.fit(GAMAsmVsize[,c("logmstar", 
"logrekpc”)], vars=GAMAsmVsize[,c("logmstar_err", 
"logrekpc_err")] ''2, weights=data[, "weights"])) 
(Figure 9) 

This analysis produces a mass-size relation of 

logio Re ^ A/'[/i = (0.382 ± 0.006) log^o - (3.53 ± 0.06), 
cr = 0.145 ±0.003], 

where Re is the effective radius of the galaxy in 
kpc, JV is the normal distribution with parameters 
11 (mean) and a (standard deviation) and A4* is 
the stellar mass in solar mass units. The relation 
published in Lange et al. (2015) is log^^g Re = 0-46 ± 
0.02(log]^o ~ 4.38 ± 0.03. The function in Lange 

et al. (2015) is noticeably steeper due to being fit to 
the running median with weighting determined by the 
spread in the data and the number of data points in 
the stellar mass bin, rather than directly to the data 
including the error. 

5.2 2D Tully-Fisher Relation 

One of the most common relationships in extragalactic 
astronomy is the so-called Tully-Fisher relation (TFR, 
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Velocity / km/s 

Figure 10. Tully-Fisher data taken from Obreschkow & Meyer 
(2013), with the best-fit hyper-fit generative model for the data 
shown as a solid line and the intrinsic scatter indicated with 
dashed lines. See Figure 3 for further details on this Figure. 


Tully & Fisher 1977). This tight coupling between 
inclination corrected disc rotation velocity and the 
absolute magnitude (or, more fundamentally, stellar 
mass) of spiral galaxies is an important redshift- 
independent distance indicator, as well as a probe 
for galaxy evolution studies. Here we use TFR data 
included in the hyper-fit package that has been taken 
from Figure 4 of Obreschkow & Meyer (2013), where 
we fit the data and create Figure 10 with 

> data(TFR) 

> plot(hyper.fit(TFR[,c("logv", "M_K")], 
vars=TFR[,c("logv_err", "M_K_err")] "2)) (Figure 
10 ) 

This yields a TFR of 

Mk ^ A/'[/i = (-9.3 ± 0.4) logio ^circ - (2.5 ± 0.9), 
cr = 0.22 ±0.04], 

where Mk is the absolute iC-band magnitude and 
Vcirc is the maximum rotational velocity of the disc in 
kms“^. Obreschkow & Meyer (2013) find 

Mk - A7[/i = (-9.3 ± 0.3) logio W - (2.5 ± 0.7), 

(7 = 0.22 ±0.06], 

where all the fit parameters comfortably agree within 
quoted errors. The parameter errors in Obreschkow & 
Meyer (2013) were recovered via bootstrap resampling 
of the data points, whereas hyper-fit used the covari¬ 
ant matrix via the inverse of the parameter Hessian ma¬ 
trix. Using HYPER-FIT we find marginally less constraint 
on the slope and intercept, but slightly more on the in¬ 
trinsic scatter. 


5.3 3D Fundamental Plane 

Moving to a 3D example, perhaps the most common 
application in the literature is the Fundamental Plane 
for elliptical galaxies (Faber 1987; Binney & Merrifield 
1998). This also offers a route to a redshift-independent 
distance estimator, but from a 3D relation rather than 
a 2D relation (although an approximate 2D version 
also exists for elliptical galaxies via the Faber-Jackson 
relation; Faber & Jackson 1976). In hyper-fit we 
include 6dFGS data released in Campbell et al. (2014), 
where we fit the data and generate Figure 11 with 

> data(FP6dFGS) 

> plot(hyper.fit(FP6dFGS[,c("logIe_J", 
"logsigma", "logRe.J")] , 
vars=FP6dFGS[,c("logIe_J_err", 

"logsigma_err", "logRe_J_err")] "2, 
weights=FP6dFGS [, "weights"] )) (Figure 11) 

This yields a Fundamental Plane relation of 

logio ^ — —(0.853 ± 0.005) log^g 7ej 

± (1.508 ± 0.012) logio ^vei - (0.42 ± 0.03), 
(7 = 0.060 ±0.001], 

where Rej (in kpc) is the effect radius in the J-band, 
lej (in mag arcsec“^) is the average surface brightness 
intensity within Rej and cr^eZ (iR kms“^) is the central 
velocity dispersion. Magoulas et al. (2012) find 

logio ^ = -(0.89 ± 0.01) logio 

± (1.52 ± 0.03) logio ^vei - (0.19 ± 0.01), 

(7 = 0.12 ± 0 . 01 ]. 

It is notable that our fit requires substantially less 
intrinsic scatter than Magoulas et al. (2012). It is dif¬ 
ficult to assess the origin of this difference since the 
methods used to fit the plane differ between this work 
and Magoulas et al. (2012), where the latter fit a 3D 
generative Gaussian with 8 degrees of parameter free¬ 
dom, which is substantially more general than the 3D 
plane with only 4 degrees of parameter freedom that 
HYPER-FIT used. 

5.4 3D Mass-Spin-Morphology Relation 

The final example of hyper-fit uses astronomical data 
from Obreschkow & Glazebrook (2014). They measured 
the fundamental relationship between mass (here the 
baryon mass (stars±cold gass) of the disc), spin (here 
the specific baryon angular momentum of the disc) 
and morphology (as measured by the bulge-to-total 
mass ratio). Hereafter this relation is referred to as the 
MJB relation. This dataset is interesting to analyse 
with HYPER-FIT because it includes error correlation 
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Figure 11. 6dFGS Fundamental Plane data and hyper-fit fit. 
The two panels are different orientations of the default plot output 
of the hyper. plotSd function (accessed via the class specific plot 
method) included as part of the R hyper-fit package, where the 
best-fit HYPER-FIT generative model for the data is shown as a 
translucent grey 3D plane. The package function is interactive, 
allowing the user the rotate the data and the overlaid 3D plane 
to any desired orientation. See Figure 3 for further details on this 
Figure. 


between mass and spin, caused by their common 
dependence on distance. In hyper-fit we included 
the MJB data presented in Obreschkow & Glazebrook 
(2014). We can fit these data and generate Figure 12 
with 

> data(MJB) 

> plotChyper.fit(X=MJB[,c("logM", "logj", 
"B.T")], covarray=makecovarray3d(MJB$logM_err, 



Figure 12. Mass-spin-morphology (MJB) data and hyper-fit 
fit. The two panels are different orientations of the default plot 
output. All error ellipsoids overlap with the best-fit 3D plane 
found using hyper-fit, implying that the generative model does 
not require any additional scatter. Indeed the observed data is un¬ 
usually close to the plane, implying slightly overestimated errors. 
See Figure 3 and 11 for further details on this Figure. 


MJB$logj_err, MJB$B.T_err, MJB$corMJ, 0, 0))) 
(Figure 12) 

This leads to a MJB relation of 


B/T - A/'[/i = (0.33 ± 0.03) log^o M 

- (0.34 ± 0.03) logio 3 - (0-04 ± 0.01), 
cr = 0.002 ±0.006], 

where 5/T is the galaxy bulge-to-total ratio, M is the 
disc baryon mass in 10^^ solar masses, and j is the spe- 
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cific baryon angular momentum in 10“^ kpckrns”^ (see 
Obreschkow & Glazebrook 2014, for details on measur¬ 
ing this quantity). Obreschkow & Glazebrook (2014) 
find 

B/T - A/'[/i = (0.34 ± 0.03) log^o M 

- (0.35 ± 0.04) logio j - (0-04 ± 0.02), 

cr = 0]. 

The agreement between hyper-fit and Obreschkow 
& Glazebrook (2014) is excellent, with all parameters 
consistent within quoted la errors. Our results are con¬ 
sistent with zero intrinsic scatter within parameter er¬ 
rors. In fact, running the analysis using MGMG and 
the GHARM algorithm we used hyper-fit to directly 
sample the likelihood of the intrinsic scatter being ex¬ 
actly 0. This is possible because we use the wall condi¬ 
tion that any intrinsic scatter sample below 0 (which is 
non-physical) is assigned to be exactly 0. We find that 
90.4% of the posterior chain samples have intrinsic scat¬ 
ter of exactly 0 post application of the wall condition 
(which imposes a spike + slab posterior), which can be 
interpreted to mean that the data is consistent with 
P(cr = 0) = 0.904. 

This agrees with the assessment in Obreschkow & 
Glazebrook (2014), where they conclude that the 3D 
plane generated does not require any additional scatter 
to explain the observed data, probably indicating that 
the errors have been overestimated^. 

6 Conclusion 

In this paper we have extended previous work (Kelly 
2007; Hogg et al. 2010; Kelly 2011) to describe the gen¬ 
eral form of the D dimensional generative hyperplane 
model with intrinsic scatter. We have also provided the 
fully documented hyper-fit package for the R statis¬ 
tical programming language, available immediately for 
installation via github^. A user friendly Shiny web in¬ 
terface has also been developed and can be accessed 
at hyperfit.icrar.org. The HYPER-FIT version released 
in conduction with this paper is vl.0.2, and regular up¬ 
dates and support are planned. 

To make the package as user-friendly as possible we 
have provided examples for simple 2D datasets, as well 


®If the real generative model is a hyperplane with no scatter then 
we expect to observe P{cr = 0) = U (0,1), where U is the uniform 
distribution. P(cr = 0) can be cast as the frequentist p-value, 
where the null hypothesis is a generative model with no intrin¬ 
sic scatter. The posterior we form via the addition of the wall 
condition is formally not a Bayesian posterior. A fully rigorous 
approach would be to compare Bayes factors for a model with 
(7 = 0 and another model with cr > 0. Using the wall condition 
posterior as described should alert the user to situations where 
such a model comparison is potentially necessary. 

^ git hub. com / asgr / hyper. fit 


as complex 3D datasets utilising heteroscedastic covari¬ 
ant errors taken from Obreschkow & Glazebrook (2014). 
We compared the published fits to the new hyper-fit 
estimates, and in most cases find good agreement for the 
hyperplane parameters (not all original papers attempt 
to make an intrinsic scatter estimate), hyper-fit offers 
users access to higher level statistics (e.g. the parameter 
covariance matrix) and includes simple class sensitive 
summary and plot functions to assist in analysing the 
quality of the generative model. 

The ambition is for hyper-fit to be regularly up¬ 
dated based on feedback and suggestions from the as¬ 
tronomy and statistics community. As such, in the long 
term the documentation included as part of the package 
might supersede some details provided in this paper. 

7 Future Work 

The current R hyper-fit package does not have a 
mechanism for handling censored data, and we cur¬ 
rently implicitly assume a uniform population distribu¬ 
tion along the hyperplane. Both of these issues can arise 
with astronomy data, where typically some fraction of 
sources might be left censored (i.e. below a certain value 
but uncertain where, and usually represented by an up¬ 
per limit) and sampled from an unknown power-law 
type distribution (hyper-fit allows for the descrip¬ 
tion of known power-law selection functions, see Ap¬ 
pendix B). A future extension for hyper-fit will allow 
for multi-parameter censoring, i.e. where a data point 
might exist some subset of parameters (e.g. stellar mass 
and redshift) but only has an upper limit for others (e.g. 
star-formation rate and HI mass). Kelly (2007) allows 
for censoring of the response variable, but not multi¬ 
ple observables simultaneously. It is possible to account 
for such censoring using a hierarchical Bayesian infer¬ 
ence software program, such as BUGS, JAGS or Stan. 
An example JAGS model of the hyper-fit scheme is 
provided in Appendix G. We also aim to allow generic 
distribution functions along and perpendicular to the 
hyperplane including power-laws and Gaussian mixture 
models. Again, using a hierarchical Bayesian inference 
software program is one potential route to achieving 
this, but at the expense of generalisability and simplic¬ 
ity. 
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Figure Al. Convergence tests for simulated generative hyper¬ 
plane datasets. The solid lines show the raw mean intrinsic scatter 
for different types of generative model and fitting, and the dashed 
lines show the intrinsic scatter once the appropriate combination 
of bias and sample-population corrections have been applied. In 
all cases the generative model was simulated with an intrinsic 
scatter for the population equal to 3. 


A Intrinsic scatter estimator bias 

A subtle point is that the expectation of the most likely 
model is not generally identical to the true model Let us 
assume a fixed generative model described by Eq. (3) with 
known parameters and From this model, we ran¬ 

domly draw a sample of N points and estimate the most 
likely values, i.e. the modes, and by maximis¬ 

ing Eq. (5). This procedure is repeated ad infinitum, al¬ 
ways with the same model and the same number of points, 
resulting in infinitely many different estimates and 

By definition, the expectations of these estimators are 
the ensemble-averages (n^^) and The question is 

whether these expectations are identical to the input pa¬ 
rameters and If so, the estimators are called un¬ 

biased, if not, they are biased. The mirror-symmetry of our 
generative model implies that is an unbiased estima- 
W, i.e. (n^L) = n'""". 

By contrast, the intrinsic scatter is generally biased, i.e. 
(cr^^) 7^ There are two reasons for this bias. Firstly, 

maximising the likelihood function C finds the most likely 
generative model of a sample of N data points, which is not 
generally the most likely model of the population, i.e. the 
imaginary infinite set, from which the N points were drawn 
at random. For example, consider a two-dimensional popula¬ 
tion described by a straight line with some intrinsic scatter. 
From this population we draw a sample of two points. No 
matter which two points we choose, they can always be fit¬ 
ted by a straight line without scatter. Hence the most likely 
sample model, maximising Eq. (5), will have zero intrinsic 
scatter, even if the population model has non-zero scatter. 
The transition from the most likely model of the sample to 
the most likely model of the population is achieved by the 


®Note that can become a biased estimator, when the data 

is drawn from the population with a non-uniform selection func¬ 
tion, as discussed in Appendix B. 
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so-called Bessel correction, 


~ 2 




N 


N-D 


2 




(Al) 


where D is the number of parameters of the hyperplane 
(also referred to as the ‘degrees of parameter freedom’ or 
simply ‘degrees of freedom’), which equals the number of 
dimensions. In the special case of Gaussian data with no 
errors (Ci = 0), the variance estimator a]_ is unbiased, 
i.e. In the general case, one can esti¬ 

mate the population variance by computing the mean of 
its marginal posterior PDF. This is often achieved using a 
Markov Chain Monte Carlo (MCMC) approach, but note 
that many MCMC algorithms introduce new biases in the 
way they determine when a Markov Chain becomes stable 
(Cowles et al. 1999). 

Secondly, the non-linear relation between and 
skews the posterior PDF in such a way that the expecta¬ 
tion of the mode is generally smaller than In the 

case of data with no errors (Ci = 0), an unbiased estimator 
can be derived from Cochran’s theorem. 


. _ r(^) 

V 2 


This estimator then satisfles (itx) = hyper-fit in- 

eludes the Bessel and Cochran corrected versions of the in¬ 
trinsic scatter (where appropriate). Figure Al gives three 
example fits for different DoF and different samples N. The 
black and the red solid lines are direct maximum likelihood 
estimation fits. These are the most biased initially, requir¬ 
ing both a Bessel and Cochran correction to give the dashed 
unbiased lines. The blue line shows a MCMC expectation es¬ 
timation, and as such only requires the Cochran correction 
to give the unbiased dashed line. These corrections do not 
properly account for the presence of heteroscedastic errors, 
but they should offer the minimum appropriate correction 
to apply. Bootstrap resampling offers a window to properly 
correct for such data biases, but this can be extremely ex¬ 
pensive to compute. 


B Extension to non-uniform selection 
functions 


We now consider the more general situation, where the pop¬ 
ulation is not sampled uniformly, but with a probability 
proportional to Ps(x), known as the selection function in 
astronomy. In this case, the effective PDF p^/(x) of the 
generative model and the selection function is the product 
Prn(x)ps(x), renormalised along the direction perpendicular 
to 'H, dl Pm'ilR) = 1- The renormalisation is crucial to 
avoid a dependence of the normalisation of p^/(x) on the 
fitted model parameters. The effective PDF reads 


, . _ Pm(x)p^(x) 

fZc,d'lPm{ln)pS^) 
The likelihood of point i then becomes 



(A3) 


(A4) 


Analytical closed-form solutions of Eq. (A4) can be found 
for some functions Ps(x). To illustrate this, let us consider 
the case of an exponential selection function 

P5(x) = e‘®^^, (A5) 

where k is the known (and fixed) sampling gradient, i.e. in 
the direction of k the probability of drawing a point from the 
population increases exponentially. This selection function 
is important in many fields of science, since it represents 
a power-law selection function in the coordinates x, if x is 
defined as x = Inx. This can be seen from 

" = If = If (®"'’)'“' = If A ’ (A6) 

j=l j=l j=l 


which is a generic D-dimensional power-law. 

Substituting pm(x) and Ps(x) in Eq. (A3) for Eqs. (3) and 
(A5) solves to 

(n~'”[x —cr^k]—n)^ 

, (A7) 

In analogy to Eq. (4), the likelihood terms then become 

Ci= dxp{xi\x)p^i(x.) = e , (A8) 

y27rs2__. 

where s5.,i = crl + n'''Cjn and = Xi — cr^k. Up to an 
additive constant, the total likelihood function InT = 
InTi reads 


Pm' (x) = 






W 2 ^ - nf 

In(sx.i) + ^-2-^ 


(A9) 


This likelihood is identical to that of the uniform case 
(Eq. (5)), as long as we substitute the positions Xi for ^i. 
Upon performing this substitution, all results of Sections 2.1 
to 2.3 remain valid in the case of a power-law sampling. Inci¬ 
dentally, it is easy to see that = x^ if the data are sampled 
uniformly (k = 0). 

The HYPER-FIT package allows for the addition of such 
a power-law selection function via the k.vec vector input 
argument in the hyper.like and hyper.fit functions. 


C Hierarchical implementation of 2D 
Hyper-Fit using JAGS 

Over recent years a number of hierarchical Bayesian infer¬ 
ence software programs have become popular, e.g. Bayesian 
inference Using Gibbs Sampling (BUGS^°), Just Another 
Gibbs Sampler (JAGS^^) and Stan^^. They are all some¬ 
what similar in terms of the modelling language, with an 
emphasis on constructing complex hierarchical models with 
relative ease (Gelman & Hill 2006). An example 2D imple¬ 
mentation of the generative hyper-fit model can be con¬ 
structed in JAGS using the following model specification: 


10 WWW. mrc-bsu. cam .ac.uk/software / bugs / 
^ ^ mcmc-jags.sourceforge.net / 
^^mc-stan.org 
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model { 

for ( i in 1 :N) { 

xyobs [ 1: 2 , i 1 ~ dmnormf xytrue [1:2,11 , 

errors [1:2 ,1:2 , i ]) 

} 

for (i in 1 :N) { 

errors [ 1 ,1 , i ] <— 1/xsd [ i ]"2/(1 — rho [ i ] " 2) 

errors [2,2 , i ] <— 1/ysd [ i ]"2/(1 — rho[i]" 2) 

errors [ 1 , 2 , i ] <— rho [ i ] / xsd [ i ] / 

ysd[i]/(l-rho[i]^2) 
errors [2 ,1 , i ] <— rho [ i ] / xsd [ i ] / 

ysd [ i]/(l-rho [ i ] '2) 

} 

for ( i in 1 :N) { 

xytrue[l,i] <— dist fr omor igin [ i ] * 
cos (angle *0.01745)— 
sin(angle*0.01745)*scat [i] 
xytrue[2,i] <— dist fr omor igin [ i ] * 
sin (angle *0.01745) + 
cos(angle*0.01745)*scat [i] + 
originy 

} 

for ( i in 1 :N) { 

dist fr omor igin [ i ] ~ dunif ( —1000,1000) 

scat[i] ~ dnorm (0,1/intrinvar) 

} 

angle ~ dunif (—90 ,90) 

originy ~ dunif ( —1000,1000) 

intrinvar ~ dunif (0,10000) 

} 

In the above 2D model the observed x-y data is contained 
in the 2 x N matrix xytrue, the x errors are in the N vec¬ 
tor xsd, the y errors are in the N vector ysd, and the error 
correlations between x and y are in the N vector rho. The ro¬ 
tation angle of the generative model, the offset of the model 
from the origin, and the intrinsic variance of the model are 
all sampled from uniform prior distributions in this exam¬ 
ple. The model specification using Stan or other BUGS style 
software programs is very similar in construct to the above. 

Compared to the hyper-fit software, the JAGS mod¬ 
elling solution converges slowly (factors of a few slower for 
equally well sampled posteriors) and tends to suffer from 
poor mixing between highly correlated parameters. This is 
a common problem with Gibbs sampling, and in practice 
the typical user can gain confidence in their fit solution us¬ 
ing HYPER-FIT by attempting a number of the available rou¬ 
tines, where e.g. Slice Sampling is an empirical approxima¬ 
tion to true Gibbs sampling. Fully hierarchical based solu¬ 
tions (e.g. BUGS, JAGS and Stan) are also hard to extend 
into arbitrary dimensions, and in general need to be care¬ 
fully hand-modified for bespoke datasets. However, these 
programs offer clear advantages in elegantly handling left 
or right censored data, with very little modification to the 
above specified model required. 



